Self Hosted E-Receipts API

Introduction

Welcome to the BlinkReceipt Self Hosted API. This API allows you to take advantage of the full power of our E-Receipts API without any PII leaving your infrastructure.

Prerequisites

Install Docker
Contact your account representative for a starter Docker environment file that will be pre-filled with credentials specific to your account

Infrastructure Requirements

Before deploying the application, ensure the following infrastructure is provisioned and accessible to the containers:

PostgreSQL-compatible database instance
- Recommended: Amazon RDS (Aurora PostgreSQL)
- Create a database and a user with read/write access to it
Redis instance
- Recommended: Amazon ElastiCache (Redis)
AmazonMQ for RabbitMQ Instance
- Recommended: AmazonMQ with RabbitMQ engine
- Instance Type: mq.t3.micro (minimum)
- Engine version: 3.11 or later
- Deployment mode: Single-instance (for development) or Cluster (for production)
- Network access: Ensure your containers can connect to the broker endpoint
S3 Bucket for OTA data updates
- The application must have read access to this bucket.
- Actual's AWS account requires write access to push OTA updates.
IAM Role Configuration (AWS)
- Attach IAM role(s) to your ECS tasks or EC2 instance profiles with the necessary permissions to access the S3 bucket, DB, Redis, and AmazonMQ.

System Overview

The application is designed to run entirely within containers and supports four core roles, all of which are encapsulated in the same Docker image and differentiated by the ROLE environment variable:

API Service (`ROLE=API`)

Exposes the main HTTP endpoints.
Requires connectivity to the database, Redis, and RabbitMQ.
Should be scaled based on expected request traffic.

Worker Service (`ROLE=WORKER`)

Processes background jobs from RabbitMQ (e.g., data processing, extraction tasks).
Should be scaled depending on job volume and desired latency.
Requires connectivity to Redis and RabbitMQ.

DB Migration Script (`ROLE=MIGRATE_DB`)

Should be run once with every deployment, as it is responsible for creating tables initially, and applying any subsequent migrations as new versions of the Docker image are released
Requires connectivity to the database.

OTA Update Cron (`ROLE=UPDATE_CRON`)

Run upon deployment and periodically (e.g., once per day).
Pulls updated data files from S3 and updates the database accordingly.
Requires connectivity to the database and S3.
Note: The updates are designed to be applied as a "hot swap" with minimal downtime, but using a DB like Aurora with read replicas will provide even more safety

Environment Variables

The application expects a number of environment variables to be set. Many of these will be pre-populated by your account rep, and many RabbitMQ configuration parameters are handled automatically by the Docker image. The variables you are expected to populate are:

Variable Name	Default	Description
ROLE		`API` - HTTP API service `WORKER` - Extraction job processor `MIGRATE_DB` - Apply db migrations `UPDATE_CRON` - Script to check for and apply OTA updates
REDIS_HOST		Redis server hostname
REDIS_PORT	6379	Redis server port
REDIS_USER		Redis username (optional)
REDIS_PASSWORD		Redis password (optional)
REDIS_TLS	false	Whether to use TLS to connect to Redis or not
REDIS_DB	0	The number of the Redis database to use
DB_HOST		PostgreSQL server hostname
DB_PORT	5432	PostgreSQL server port
DB_NAME		PostgreSQL database name
DB_USERNAME		PostgreSQL username
DB_PASSWORD		PostgreSQL password
DB_SSL	false	Controls whether the app's PostgreSQL connection uses SSL/TLS encryption
AWS_REGION	us-east-1	AWS region for S3 and other services
DB_UPDATE_ROLE_ARN		The IAM role ARN for accessing the S3 bucket which will contain OTA DB updates
DB_UPDATE_S3_BUCKET		The name of the S3 bucket which will contain OTA DB updates
SCRAPE_WORKER_TIMEOUT_MS	60000	Timeout in milliseconds for each template to be processed
CPP_THREAD_POOL_SIZE	50	Maximum number of worker threads for handling concurrent requests
CPP_RETRY_MAX_ATTEMPTS	10	Maximum number of retry attempts before a request fails
RABBITMQ_MESSAGE_EXPIRE_MS	60000	Number in ms how long until an unacked message expires in a queue. This should have the same value as SCRAPE_WORKER_TIMEOUT_MS
RABBITMQ_TEMPLATE_BATCH	5	The number of templates processed per batch. Increasing this will require more memory for each worker. We approximate that each template per batch costs ~300mb of memory
RABBITMQ_URL	amqp://localhost:5672	RabbitMQ connection URL (use amqps:// for TLS)
NODE_MEMORY	1900	This is used to set the NODE_OPTION `max-old-space-size`
SCRAPE_QUEUE	scrape_queue	Name of the queue where workers listen for templates to process
HTTPS	false	Enables HTTPS support when set to "true", requires `CERT_CN`
CERT_CN		The `Common Name` for the TLS certificate that the app will auto-generate
ENABLE_SCAN_API	false	Allows extraction api to send receipts to OCR scanning service for specific merchants
OCR_SCAN_URL_ON_PREM		Base URL of the OCR scanning service

AmazonMQ for RabbitMQ Setup

To set up AmazonMQ for RabbitMQ, follow these steps:

1. Create AmazonMQ Broker

Navigate to AmazonMQ Console
- Go to the AmazonMQ console in your AWS account
- Click "Create broker"
Configure Broker Settings
- Engine type: RabbitMQ
- Engine version: 3.11.x or later (recommended)
- Deployment mode:
  - Single-instance for development/testing
  - Cluster for production (provides high availability)
- Instance type: mq.t3.micro (minimum) or larger based on your throughput requirements
Configuration
- Broker name: Choose a descriptive name (e.g., ereceipt-extraction-broker)
- Username and Password: Create credentials for the broker (you'll use these in RABBITMQ_URL)
Connectivity
- Virtual Private Cloud (VPC): Select the same VPC where your containers will run
- Subnet(s): Choose appropriate subnets
- Security groups: Create or select security groups that allow:
  - Inbound access on port 5671 (AMQP with TLS) or 5672 (AMQP without TLS)
  - Access from your container security groups

2. Configure Environment Variables

Once your AmazonMQ broker is created, you'll get an endpoint URL. Configure your environment variables:

# For TLS connection (recommended for production)
RABBITMQ_URL=amqps://username:password@your-broker-id.mq.us-east-1.amazonaws.com:5671

# For non-TLS connection (development only)
RABBITMQ_URL=amqp://username:password@your-broker-id.mq.us-east-1.amazonaws.com:5672

3. Security Group Configuration

Ensure your security groups allow:

Outbound from your container security group to AmazonMQ security group on port 5671/5672
Inbound to AmazonMQ security group from your container security group on port 5671/5672

4. Network Connectivity

If using public subnets: Ensure your AmazonMQ broker has public access enabled
If using private subnets: Ensure proper routing between your container subnets and AmazonMQ subnets
For cross-AZ deployment: Consider placing your broker in multiple availability zones for resilience

Running a container

Quick Start

Download this Docker Compose file
Make sure your .env.client is in your project folder and all vars are populated
Decide how you will authenticate to AWS and set env vars / modify docker-compose-ereceipts.yml accordingly:
- Currently the docker compose file mounts your local ~/.aws folder into the relevant containers so that they can authenticate the same way you do locally
- It also sets the AWS_PROFILE env var which you should delete if you want it to use your default config, or override with a profile value of your choice
- If you prefer to authenticate using access token + secret key, then you can remove the ~/.aws mounts from all containers and instead set the appropriate env vars
Make sure to set your db-related env vars according to whatever values (user, pass, db name) you pass into the postgres container in docker-compose-ereceipts.yml

This should serve as a blueprint for how the service is orchestrated in production.

Testing the API

Once the container is running, you can make requests against http[s]://localhost:4001 as the domain. You can find the request and response structures in our API Spec.

Logging

Logs are written to stdout and can be collected or shipped as needed

Production Deployment

Seeding DB

Make sure to run the DB Migration Script and the OTA Update Cron for each environment upon deployment to ensure that the DB has the structure + data needed by the app to perform extraction

Optimizing Performance

For optimal performance in a Kubernetes or ECS deployment, we recommend starting with the following configuration:

API

vCPUs: 1
RAM: 1 GiB
Replicas: 6

Workers

vCPUs: 1.5
RAM: 2.5 GiB
Replicas: 50

Worker Env Vars

RABBITMQ_TEMPLATE_BATCH 5
NODE_MEMORY 1900

Expected throughput is ~1 rpm per worker, and average latency is ~5s. Scaling horizontally will improve throughput while maintaining the same latency characteristics.

Readiness & Health Checks

For orchestration such as Kubernetes that can make use of readiness and liveness probes, these are available at GET /readyz and GET /healthz respectively. These endpoints will return status code 200 for success and 500 otherwise.

Debugging

To debug data quality issues (i.e. wrong/missing fields), it is most helpful to provide us with the blinkReceiptId associated with the request as well as the email that was passed in so that we can attempt to reproduce.

Introduction​

Prerequisites​

Infrastructure Requirements​

System Overview​

API Service (ROLE=API)​

Worker Service (ROLE=WORKER)​

DB Migration Script (ROLE=MIGRATE_DB)​

OTA Update Cron (ROLE=UPDATE_CRON)​

Environment Variables​

AmazonMQ for RabbitMQ Setup​

1. Create AmazonMQ Broker​

2. Configure Environment Variables​

3. Security Group Configuration​

4. Network Connectivity​

Running a container​

Quick Start​

Testing the API​

Logging​

Production Deployment​

Seeding DB​

Optimizing Performance​

Readiness & Health Checks​

Debugging​